IMS Lecture Notes— Monograph Series 

Recent Developments in Nonparametric Inference and Probability 

Vo l. 50 (2006) 156-163 

© Institute of Mathematical Statistics 2006 
DOI: |10.12 14/074921706000000662l 

A test for equality of multinomial 
distributions vs increasing convex order 

Arthur Coherf^ John Kolassfffll] and Harold Sackrowitz^H 

Rutgers University 

Abstract: Recently Liu and Wang derived the likelihood ratio test (LRT) 
statistic and its asymptotic distribution for testing equality of two multino- 
mial distributions vs. the alternative that the second distribution is larger in 
terms of increasing convex order (ICX). ICX is less restrictive than stochastic 
order and is a notion that has found applications in insurance and actuarial 
science. In this paper we propose a new test for ICX. The new test has sev- 
eral advantages over the LRT and over any test procedure that depends on 
asymptotic theory for implementation. The advantages include the following: 

(i) The test is exact (non-asymptotic). 

(ii) The test is performed by conditioning on marginal column totals (and 
row totals in a full multinomial model for a 2 X C table). 

(iii) The test has desirable monotonicity properties. That is, the test is 
monotone in all practical directions (to be formally defined). 

(iv) The test can be carried out computationally with the aid of a computer 
program. 

(v) The test has good power properties among a wide variety of possible 
alternatives. 

(vi) The test is admissible. 

The basis of the new test is the directed chi-square methodology developed 
by Cohen, Madigan, and Sackrowitz. 



1. Introduction 

Recently, Liu and Wang Q derived the likelihood ratio test (LRT) statistic and 
its asymptotic distribution for testing equality of two multinomial distributions 
vs. the alternative that the second distribution is larger in terms of increasing 
convex order (ICX). See also Liu and Wang [7(. A formal definition of ICX is as 
follows: the distribution of a random variable Y is larger than the distribution of 
a random variable X in the increasing convex order, i.e. X <icx Y, if and only 
if E{f(X)} < E{f(Y)} holds for all non-decreasing convex functions / for which 
expectations are defined. ICX is less restrictive than stochastic order and is a notion 
that has found applications in insurance and actuarial science. See, for example, 
Goovaerts, Kaas, Van Heerwaarden and Bauwelinckx Q and other references cited 
by Liu and Wang Q. In this paper we propose a new test for ICX. The new test 
has several advantages over the LRT and over any test procedure that depends on 
asymptotic theory for implementation. The advantages include the following: 
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(i) The test is exact (non- asymptotic). It can be implemented regardless of sam- 
ple sizes. 

(ii) The test is performed by conditioning on marginal column totals (and row 
totals in a full multinomial model for a 2 x C table) . Conditioning enables the null 
hypothesis to be expressed as a simple null, and can be carried out by calculating 
conditional P- values. 

(iii) The test has desirable monotonicity properties. That is, the test is monotone 
in all practical directions (to be formally defined) . Intuitively monotone in practical 
directions means that if the test rejects for a sample point, say x, then it should 
also reject for a sample point y where y empirically is more indicative of ICX than 
x. The LRT is not monotone in all practical directions. 

(iv) The test can be carried out with the aid of a computer program. 

(v) The test has good power properties among a wide variety of possible alter- 
natives. 

(vi) The test is admissible. 

The basis of the new test is the directed chi-square methodology developed by 
Cohen, Madigan and Sackrowitz [B| . 

In the next section we will state the formal model while defining ICX. We will 
also state the hypothesis and define practical directions. Furthermore, we deter- 
mine the practical directions for the ICX alternative. In Section 3, we offer the 
directed chi-square test statistic. Section 4 contains an example concerned with an 
age discrimination study. In this same section we offer a simulation study comparing 
powers of the new test with an exact version which uses the LRT statistic. Finally 
Section 5 contains a discussion regarding the importance of the monotonicity prop- 
erties. 

2. Models and definitions 

Consider a 2 x C contingency table under the product multinomial model. Assume 
the C categories are ordered (worst to best; increasing age groups; etc.). Let Xy, 
Pij, i = 1, 2; j = 1, . . . , C represent cell frequencies and cell probabilities for the cell 
i, j. Note ^2j = i Xij — iii are fixed, zJj=i Pij = 1 f° r * = 1j 2, and let Xij + A 2 j = tj 
denote column totals, j = 1, . . . , C. Also let N = ni + n 2 . Define log odds ratios as 

(2-1) Vj = log(p lj p 2 c/PicP2j), (j = 1,-..,C- 1). 

Also X = (Xi,X 2 )', where X l = (X a , . . . , X iC ), for i = 1,2, v= (z/i, . . . , v C -x)'. 
Note X is a 2C x 1 column vector and v is a (C — 1) x 1 column vector. The 
null hypothesis to be studied is H : pi = p 2 . The alternative hypothesis is called 
increasing convex order (ICX) and is defined as follows: Let Ai > A 2 > • • • > \c-i > 
be (C — 1) given constants. Then the distribution with parameter pi is said to be 
smaller in ICX than the distribution with p 2 as parameter if for r = 1, . . . ,C — 1, 

r C-l 

(2.2) A r = A r 5^(p y - p 2j ) + A >U - P2j) > 0. 

j=l j=r+l 

This definition is essentially the same as the one given by Liu and Wang [8J . This is 
an equivalent form of ICX for two multinomial distributions. Hence the alternative 
is denoted by 



(2.3) 



Kicx ■ {p = Pi,p 2 ) : (2-2) holds} \ H. 
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Our approach to testing is to condition on the column totals (row totals as 
well if the model is full multinomial) since these totals are the complete sufficient 
statistics under H. We let m = (ti, . . . , tc) denote these sufficient statistics (under 
the full multinomial model m = (ni, ri2, ii, . . . , tc))- The conditional distribution 
of X^ 1 ) = (Xu, ■ .., -Xi(c-i))' given m is the multivariate extended hypergeometric 
distribution, which in exponential family form is 

(2.4) /(x( 1 );^)-/. m (x( 1 ))/3 m (^)e x<1> ^. 

See 0]. For the conditional problem, H becomes H* : v\ = v% = ■ ■ ■ = vc—i = 0. 
In order to specify the appropriate alternative when m is fixed we need 

Lemma 2.1. Let Q~ = {v e Rc-i : Vj < Q, all j = 1, . . . , C — 1}. Consider the 
set 



(2.5) r = Vei c _ 1 \Q-}\{0}. 

Given any v 6 T, there exists some p(v) satisfying (2.2). Furthermore if v £ Q~ , 
there is no p satisfying (2.2). 

Proof. See Appendix. □ 

In light of Lemma 2.1, for the conditional problem we take the alternative to be 

K* ICX :{v. V eT}\H\ 

Now let </>(x) denote a test function; i.e., 0(x) is the probability that the test 
rejects H for an observed sample point x. 

Definition 2.1. A test 0(x) is said to be monotone in the direction £ = (£i, . . . , S,2c)' 
if and only if 

(2.6) <Kx) <0(x + 7 £), 
for every 7 > 0. 

Since we will do testing by conditioning on m, and since ri\, ri2 are fixed, hereafter 
we only consider directions such that 

c 

(2.7) £y+6y = 0, j = l,...,C, and ^^=0, %=1,2. 

3=1 

At this point let = Xij/rii and consider the vector p = (pn, . . . ,P2c)'- Let 

(2.8) A*(x)=A r (p). 

Definition 2.2. A direction d is said to be a practical direction if 

(2.9) A;(x + d) > A;(x), forr = l,...,C-l. 

An interpretation of a practical direction is that the empirical distributions are 
becoming more ICX. Note that if a test function is monotone in directions di and 
d.2 (see (2.6)), it is monotone in the direction a%dx +a2d2 as long as a% > 0, > 0. 
This implies that the collection of practical directions for which (f> is to be monotone 
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generates a closed convex polyhedral cone C. Using (2.2), (2.7), (2.8), and (2.9) we 
may express C as follows: 



(2.10) 



C = {d : Bd = 0,Gd > 0}, 



where B is a (C + 1) x 2C matrix expressing the constraints in (2.7) and G 
(G±, —(ni/n 2 )Gi) is the (C — 1) x 2C matrix and 



(2.11) 



Gi = — 
ni 



/ Ai A 2 Ac-i 0\ 

A2 A2 A3 Ac-i 

A3 A3 A3 A4 • • • Ac-i 



\A0-1 



Ac-iO/ 



Remark. The same example used in Q can be used to demonstrate that the LRT 
is not monotone in all practical directions. 



3. Directed chi-square 



The directed chi-square statistic was introduced in [5[. The statistic is 

2 c 2 c 

(3.1) XdW = y) Y] x^/n.tj = inf ^ ^ «yM*j, 

2—1 J — 1 2—1 J — 1 

where u = (111,112)' is a 2C x 1 vector, x* is the minimizer of the sum on the 
right-hand side of (3.1) and A(x) is a set in K2C, depending on the data x and 
determined by a set of linear equalities and linear inequalities. Namely, 

(3.2) A(x) = {u e K 2C : B(u x) = 0, G(u - x) > 0}, 

where B and G are specified in (2.10), (2.11). 

The statistic Xd can be determined by using an IMSL subroutine called 
DQPROG. That is, given an observed value of x, call it Xo, determine x* of (3.1). 
Next use the exact method of Pagano and Halvorsen 0] to generate all tables con- 
sistent with the given m and the conditional probabilities under H of these tables. 
Sum the probabilities of the sample points for which Xd( x ) > xi)( x o) phis the 
probabilities of the sample points for which Xr>( x ) — Xd( x o)- The total probability 
is the conditional P-value. If this P-value < a, reject H. 

The directed chi-square test is monotone in all practical directions. A proof of this 
is given in ||. The directed chi-square test is admissible. To show this, recognize 
first that the test for the ICX alternative is admissible for the stochastic order 
(SO) alternative, which is a smaller parameter set than the ICX alternative. The 
admissibility for the SO alternative follows from Theorem 4.3 of using the facts 
that (i) the test is monotone in x%i while x ij> k = 2, . . . ,C — 1, is fixed and 

(ii) that the acceptance region of the test is convex. See 

Remark. Should there be several sample points yielding the same value of Xd ^ 
may be helpful to use a backup statistic as is done in 
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Table 1 
Success and age in competition 



Age 



20-30 



31-40 



41-50 



51-60 



Totals 



Success 
Failure 



1 




6 
1 



19 
11 



1 

8 



30 
23 



4. Example and power comparison 

Barry and Boland |l[ study the relationship of age and successful employment in 
Ireland. Table 1 contains relevant data. 

This is a reasonable example to consider ICX as opposed to stochastic ordering, 
since apriori one might suspect that older people will have a smaller chance of 
gaining employment than younger people, whereas at the young age groups you 
would not expect much of a difference. Using Ai=3, A 2 = 2, A3 = 1 (see (2.2)), we 
find the conditional P-value for this data set using the directed x 2 -test is 0.10539. 
For the LRT the corresponding P- value is 0.16575. 

A study was conducted to compare exact conditional power of the directed \ 2 ~ 
test with an exact test based on the LRT statistic. The study was based on the data 
from the marginal totals of Table 1 save that the first and second columns were 
combined. Hence the problem is in terms of a 2 x 3 table with marginal column 
totals of (11, 30, 12) and row totals (30, 23). Calculations were performed using 
Fortran 90 and the IMSL mathematical library for nonlinear function minimization. 
We took Ai = 2, A2 = 1- In order to calculate the constrained maximum likelihood 
estimate under ICX order, IMSL subroutine DL20NG was used to minimize the 
likelihood subject to the linear constraints (2.2) and Ylj=iPij = 1> * = 1>2- The 
likelihood is simply the product of the two multinomial distributions. In addition 
to the likelihood the derivative was also provided by DL20NG in a separate sub- 
routine. The chi-square statistic (3.1) minimized under ICX order was obtained 
using the IMSL routine DQPROG for minimizing a quadratic form under linear 
constraints; the constraints are given in (3.2). 

In order to calculate the P-value of the directed chi-square test and to calculate 
powers all tables with the same marginal totals as the observed table are enumer- 
ated. These tables, and their probabilities conditional on row and column sums, 
were calculated using the algorithm of Pagano and Halvorsen [9J. A conditional 
P-value was calculated as the sum of table probabilities for which the test statistic 
was as large as or larger than that observed. Powers were calculated by reweighting 
the tables using the ratio of the likelihood under the alternative hypothesis to the 
likelihood under the null hypothesis, and summing the probabilities associated with 
tables whose test statistics were as large as or larger than those observed. Also the 
powers were adjusted so that test sizes are exactly 0.05. 

Table 2 contains exact powers of the direct chi-square test and the exact test 
performed conditionally using the unconditional LRT statistic. Various ICX alter- 
natives are considered. We note that the powers of the two tests are comparable. 
The LRT is slightly better for some alternatives that are further from a null case 
while \ 2 is preferred for alternatives closer to a null case. 

5. Discussion 

One referee has misgivings about this paper because of our claim that monotonicity 
in practical directions is an intuitively desirable property. The referee refers to Perl- 
man and Chaudhuri [ll| where it is argued that such a property is not compelling 
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Table 2 

Exact powers for the directed \ 2 test and for the LRT alternatives 



Alternatives Powers 



Pll 


P12 


P13 


P21 


P22 


P23 


Y 2 
A 


LRT 


0.10 


0.60 


0.30 


0.20 


0.10 


0.70 


0.9747 


0.9940 


0.10 


0.60 


0.30 


0.20 


0.20 


0.60 


0.8941 


0.8918 


0.10 


0.80 


0.10 


0.20 


0.30 


0.50 


0.9591 


0.9743 


0.10 


0.50 


0.40 


0.30 


0.10 


0.60 


0.9790 


0.9823 


0.30 


0.50 


0.20 


0.40 


0.10 


0.50 


0.9640 


0.9782 


0.20 


0.30 


0.50 


0.30 


0.10 


0.60 


0.7050 


0.7009 


0.10 


0.40 


0.50 


0.20 


0.10 


0.70 


0.9163 


0.9161 


0.10 


0.60 


0.30 


0.30 


0.10 


0.60 


0.9863 


0.9961 


0.50 


0.40 


0.10 


0.60 


0.10 


0.30 


0.9526 


0.9627 


0.10 


0.60 


0.30 


0.15 


0.45 


0.40 


0.2278 


0.2205 


0.20 


0.30 


0.50 


0.25 


0.20 


0.55 


0.1934 


0.1863 


0.20 


0.40 


0.40 


0.25 


0.20 


0.55 


0.4464 


0.4369 


0.10 


0.40 


0.50 


0.12 


0.35 


0.53 


0.0862 


0.0813 


0.30 


0.30 


0.40 


0.32 


0.27 


0.41 


0.0701 


0.0660 


0.10 


0.40 


0.50 


0.15 


0.30 


0.55 


0.1689 


0.1647 



and since the likelihood ratio test does not have the property it is undesirable. Our 
reaction to this has been discussed in some detail in Cohen and Sackrowitz [4j, a 
paper that appears in the same year of the same journal as the paper by Perlman 
and Chaudhuri [111 ]. 

Our take on the controversy is as follows: Likelihood inference is the default 
methodology in much of statistical inference where it is feasible. It has large sample 
optimality properties that are unsurpassed under very mild conditions. It generally 
has intuitive appeal as well. However in some order restricted inference problems 
likelihood inference has competitors that can have intuitive properties that likeli- 
hood procedures do not share. We offer one example here, borrowed from Cohen 
and Sackrowitz [2] and leave it to the reader to judge the intuitiveness of the mono- 
tonicity property we claim is desirable. See also a recent paper by Peddada, Dunson 
and Tan [10( which offers competitors to maximum likelihood estimators. 

Example. Consider a 2 x 3 contingency table under the product multinomial 
model. Let JQj, i = 1,2; j = 1,2,3 be cell frequencies and pij be correspond- 
ing cell probabilities. Test pi = p 2 (when p { = [pn,Pi2,Pa)) {Y^j=iPij = !)> vs 
Hi : {P2 >st PijA-Ho, where > st means the p 2 distribution is stochastically larger 
than pi i.e., pn > pi\ and pn +P12 > P21 + P22 with at least one strict inequality. 
Note P2 > s t Pi implies p 2 >icx Pi - Now consider the following two sample points: 
Our intuition suggests that the conditional p- value (given marginal totals fixed) 
should be smaller for sample point 1 than for sample point 2. Yet the p- value using 
the likelihood ratio statistic is 0.169 for sample point 1 and 0.019 for sample point 2. 



Sample Point 1 


Group 


Worse 


Same 


Better 


Total 


Control 


5 


11 


1 


17 


Treat. 


3 


8 


4 


15 


Total 


8 


19 


5 


32 


Sample Point 2 


Group 


Worse 


Same 


Better 


Total 


Control 





16 


1 


17 


Treat. 


8 


3 


4 


15 


Total 


8 


19 


5 


32 
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We feel that blanket statements that claim monotonicities are always desirable 
or always undesirable should not be made. Considerations of such should be made 
on a case by case basis. 



Appendix A: Appendix section 



Proof of Lemma 2.1. Recognize that T is the set of v's such that at least one 
component of v is greater than zero. Let v q > for some 1 < q < C — 1. Now let 

pi-j = M x e a] , j = 1, . . . , C but not j = q or j = C, 
p lq ^M 1 Ae a ", p ic = Ae ac , 

P2j = M 2 e bj , j = 1, . . . , C but not j = q or j = C, 
p 2q =M 2 Ae b \ pic = Ae bc . 



The constants a = (ai, . . . , ac), b = (bi, . . . , be), are as follows: 

dj = 3I/J-/2, j = 1,.. .,q- l,q+ 1,. ..,C- 1, 
a q =0, a c = Vi/2, 

h = 0, bj = -(Vj + n)/2, j = 2, . . . , q - 1, q + 1, . . . , C - 1, 
b q = -vj - v-l/3, b c = 0. 



This choice of constants yields the given v's. The constants Mi and M 2 are deter- 
mined by the fact that ^2ptj = 1, i = 1,2. We now verify that this choice of p(y) 
satisfies (2.2) for some A. First let r = 1, so that we must show 

Eg + Eg^+i A/; + A 9 Ae^ 

E"=i e a ^ + E?=,+i e " J + A ( ea ' + eac ) 

> Eg + Eg^i + \A e fc - 

" E?=J + E?=ii e 6 ^ + A(e b * + e^) ' 
We will let A — > 00 so that from (A.l) it suffices to show 
(A.2) e a "(e b " +e b °) > e b «(e a « + e ac ) 

which reduces to 

(A.3) e a " +hc > e ac+b " 

or 

(A.4) a q + b c > a c + b g . 

However v q = a q + be — ac — b q > by hypothesis. This shows (3.2) for r = 1. For 
2 < r < C — 1 the argument is essentially the same. 

To complete the lemma we need to show that if all v's are negative then no 
p(v) G T. But for r = C — 1, (2.2) reduces to p 2 c > Pic- If this is the case then for 
some j, j = 1, . . . , C — 1, pij > p2j implying that Vj > 0. 
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